Protocol Buffers
   HOME

TheInfoList



OR:

Protocol Buffers (Protobuf) is a
free and open-source Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
cross-platform In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software r ...
data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data. The method involves an
interface description language interface description language or interface definition language (IDL), is a generic term for a language that lets a program or object written in one language communicate with another program written in an unknown language. IDLs describe an inter ...
that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.


Overview

Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
developed Protocol Buffers for internal use and provided a
code generator In computing, Code generation denotes software techniques or systems that generate program code which may then be used independently of the generator system in a runtime environment. Specific articles: * Code generation (compiler), a mechanism to pr ...
for multiple languages under an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
license (see below). The design goals for Protocol Buffers emphasized simplicity and performance. In particular, it was designed to be smaller and faster than
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
. Protocol Buffers are widely used at Google for storing and interchanging all kinds of structured information. The method serves as a basis for a custom
remote procedure call In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared network), which is coded as if it were a normal (l ...
(RPC) system that is used for nearly all inter-machine communication at Google. Protocol Buffers are similar to the
Apache Thrift Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It was developed at Facebook for "scalable cross-language services development" and as of ...
(used by Facebook,
Evernote Evernote is a note-taking and task management application. It is developed by the Evernote Corporation, headquartered in Redwood City, California. It is intended for archiving and creating notes in which photos, audio and saved web content can ...
),
Ion An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by conve ...
(created by Amazon), or Microsoft Bond protocols, offering as well a concrete RPC
protocol stack The protocol stack or network stack is an implementation of a computer networking protocol suite or protocol family. Some of these terms are used interchangeably but strictly speaking, the ''suite'' is the definition of the communication protoco ...
to use for defined services called
gRPC gRPC (Google Remote Procedure Calls) is a cross-platform open source high performance Remote procedure call, Remote Procedure Call (RPC) framework. gRPC was initially created by Google, which has used a single general-purpose RPC infrastructure ...
. Data structures (called ''messages'') and services are described in a proto definition file (.proto) and compiled with protoc. This compilation generates code that can be invoked by a sender or recipient of these data structures. For example, example.pb.cc and example.pb.h are generated from example.proto. They define
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
classes for each message and service in example.proto. Canonically, messages are serialized into a
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ...
wire Overhead power cabling. The conductor consists of seven strands of steel (centre, high tensile strength), surrounded by four outer layers of aluminium (high conductivity). Sample diameter 40 mm A wire is a flexible strand of metal. Wire is c ...
format which is compact, forward- and backward-compatible, but not
self-describing In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge. In web development, ...
(that is, there is no way to tell the names, meaning, or full datatypes of fields without an external specification). There is no defined way to include or refer to such an external specification (
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
) within a Protocol Buffers file. The officially supported implementation includes an ASCII serialization format, but this format—though self-describing—loses the forward- and backward-compatibility behavior, and according to some, is thus not a good choice for applications other than debugging. Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with other languages or systems might be needed in the future.


Example

A schema for a particular use of protocol buffers associates data types with field names, using integers to identify each field. (The protocol buffer data contains only the numbers, not the field names, providing some bandwidth/storage savings compared with systems that include the field names in the data.) // polyline.proto syntax = "proto2"; message Point message Line message Polyline The "Point" message defines two mandatory data items, ''x'' and ''y''. The data item ''label'' is optional. Each data item has a tag. The tag is defined after the equal sign. For example, ''x'' has the tag 1. The "Line" and "Polyline" messages, which both use Point, demonstrate how composition works in Protocol Buffers. Polyline has a ''repeated'' field, which behaves like a vector. This schema can subsequently be compiled for use by one or more programming languages. Google provides a compiler called protoc which can produce output for C++, Java or Python. Other schema compilers are available from other sources to create language-dependent output for over 20 other languages. For example, after a C++ version of the protocol buffer schema above is produced, a C++ source code file, polyline.cpp, can use the message objects as follows: // polyline.cpp #include "polyline.pb.h" // generated by calling "protoc polyline.proto" Line* createNewLine(const std::string& name) Polyline* createNewPolyline()


Language support

Protobuf 2.0 provides a
code generator In computing, Code generation denotes software techniques or systems that generate program code which may then be used independently of the generator system in a runtime environment. Specific articles: * Code generation (compiler), a mechanism to pr ...
for
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, C#, and
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
. Protobuf 3.0 provides a code generator for
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
(including JavaNano, a dialect intended for low-resource environments),
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, Go,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
,
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
, C#. It also supports JavaScript since 3.0.0-beta-2. Third-party implementations are also available for
Ballerina A ballet dancer ( it, ballerina fem.; ''ballerino'' masc.) is a person who practices the art of classical ballet. Both females and males can practice ballet; however, dancers have a strict hierarchy and strict gender roles. They rely on yea ...
, C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Dart Dart or DART may refer to: * Dart, the equipment in the game of darts Arts, entertainment and media * Dart (comics), an Image Comics superhero * Dart, a character from ''G.I. Joe'' * Dart, a ''Thomas & Friends'' railway engine character * Dar ...
,
Elixir ELIXIR (the European life-sciences Infrastructure for biological Information) is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring t ...
, Erlang,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
,
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
,
Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
, R,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
, Scala,
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
,
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
and Nim.


See also

*
gRPC gRPC (Google Remote Procedure Calls) is a cross-platform open source high performance Remote procedure call, Remote Procedure Call (RPC) framework. gRPC was initially created by Google, which has used a single general-purpose RPC infrastructure ...
*
Comparison of data-serialization formats This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats. Overview Syntax comparison of human-readable form ...
* FlatBuffers *
ASN.1 Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and ...


References


External links


Official documentation
at developers.google.com * {{Google FOSS Data serialization formats Google software Software using the BSD license